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Abstract 

The problem of multiple hypothesis testing with observation control is considered in both fixed 
sample size and sequential settings. In the fixed sample size setting, for binary hypothesis testing, it is 
shown that the optimal exponent for the maximal error probability corresponds to the maximum Chernoff 
information over the choice of controls. It is also shown that a pure stationary open-loop control policy is 
asymptotically optimal within the larger class of all causal control policies. For multihypothesis testing 
in the fixed sample size setting, lower and upper bounds on the optimal error exponent are derived. 
It is also shown through an example with three hypotheses that the optimal causal control policy can 
be strictly better than the optimal open-loop control policy. In the sequential setting, a test based on 
earlier work by Chernoff for binary hypothesis testing, is shown to be first-order asymptotically optimal 
for multihypothesis testing in a strong sense, using the notion of decision making risk in place of the 
overall probability of error. Another test is also designed to meet hard risk constrains while retaining 
asymptotic optimality. The role of past information and randomization in designing optimal control 
policies is discussed. 
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I. Introduction 

The topic of controlled sensing for inference deals primarily with adaptively managing and 
controlling multiple degrees of freedom in an information- gathering system, ranging from the 
sensing modality to the physical control of sensors, to achieve a given inference task. Unlike 
in traditional control systems, where the control primarily affects the evolution of the state, in 
controlled sensing, the control affects only the observations. In other words, the goal is not to 
drive the state to some desired level, but for the decision-maker to infer the state accurately by 
shaping the quality of observations. 

Some applications of controlled sensing include, but are by no means limited to, target 
detection, tracking and classification (see, e.g., [UJ, [0). Castro et al considered the problem 
of airborne laser topographical mapping, where the goal is to find an optimal policy for se- 
quential redirection of a laser beam to perform quickest detection of topographical step changes. 
Controlled sensing policies were also developed for landmine and underwater mine classification 
in [|4]|. In the domain of clinical diagnosis, controlled sensing has been used to choose among 
various diagnostic tools to better identify and treat certain diseases Q. Dynamic sensor selection 
and scheduling policies were also developed for tracking and target localization in [0, [|7), BSJ. 

In this paper, we focus on the basic inference problem of hypothesis testing, and our goal is to 
find an asymptotically optimal joint-design of a control policy and a decision rule (in addition to 
a stopping rule for the sequential setting) to decide among the various hypotheses. In particular, 
we consider a Markovian model for the simple hypothesis testing of multiple hypotheses with 
observation control. Prior to making a decision about the hypothesis, the decision-maker can 
choose among different actions to shape the quality of the observations. We consider both the 
fixed sample size and sequential settings of this problem. In the latter setting, the controller can 
adaptively choose to stop taking observations, and the sequential test is fully described by a 
control policy, a stopping rule and a final decision rule. 

A. Relationship to Prior Work 

We begin by discussing prior work in the fixed sample size setting. Tsitsiklis [9] considered the 
problem of quantizing independent observations at geographically separated sensors for multiple 
hypothesis testing. The number of sensors, which is taken to infinity in [9], can be considered to 
be equivalent to the sample size in our controlled sensing problem. Therefore, the quantization 
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rules can be considered to be special cases of control actions that can affect the observations at 
the output of the various sensors. However, the control actions in the controlled sensing problem 
are much more general. Furthermore, the observation control policy in [|9l is effectively an open- 
loop control policy. In contrast, our main focus in this paper is on temporal observation control 
in which the control at each time can be influenced by the past observations. 

In the fixed sample size setting, the block channel coding problem with feedback and with 
a fixed number of messages studied by Berlekamp ifTOl can also be considered to be a special 
case of the controlled sensing problem. This is because, in the coding problem, the controller 
(encoder) has access to the hypothesis (message), whereas in our controlled sensing problem the 
controller is not assumed to have access to the hypothesis and is therefore more challenging. 

The controlled sensing problem is also more general than the multi-channel identification 
problem treated by Mitran and Kavcic [11], in which there is & finite constraint on the number 
of past channel outputs available to the input signal selector at each time. In contrast, the causal 
control policies considered herein can depend on the entire past observations, the number of 
which becomes unbounded as the horizon approaches infinity. In related work, Hayashi 021 
considered the discrimination of channels using adaptive methods with unbounded memory, but 
for models with only two channels, i.e., two hypotheses. 

In Section Unl we first present a characterization for the optimal error exponent for binary 
hypothesis testing with a fixed sample size showing that a pure stationary open-loop control, 
where the control value at each time is fixed and does not depend on past measurements and 
past controls, achieves the optimal error exponent among the class of causal controls. In fact, 
this result is in agreement with that of Hayashi on discrimination of two channels [fT2]| . The 
latter result, which was not known to us when we first presented the optimal error exponent 
for binary hypothesis testing with fixed sample size in [fT3l . was motivated by a channel coding 
application and turns out to be mathematically equivalent to the result in discussion (see also 
Footnote O. Then, for general multiple hypothesis testing with a fixed sample size, we derive a 
characterization for the optimal error exponent achievable by open-loop control. With more than 
two hypotheses, the characterization for the optimal error exponent achievable by causal control 
(which can be a function of past measurements and past controls) is a much more difficult 
problem. In fact, the structure of the optimal control is not known in general. Nevertheless, we 
show through a concrete example with only three hypotheses that the optimal causal control 
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policy can be strictly better than the optimal open-loop control policy. We also derive general 
lower and upper bounds for the optimal error exponent achievable by causal control. 

We now discuss related work in the sequential setting. The problem of sequential hypothesis 
testing without control was introduced by Wald 03), [|~T5l and studied in detail for the binary 
hypothesis case. In this work, the optimal expected values of the stopping time were characterized 
subject to constraints on the probabilities of error under each hypothesis. It was shown that the 
Sequential Probability Ratio Test (SPRT) is optimal, i.e., among all tests with the same power, the 
SPRT requires on average the fewest number of observations. An extension to the multihypothesis 
case was considered in lTT6l where the authors proposed a Multihypothesis SPRT (or MSPRT) 
which was later shown to satisfy certain asymptotic optimality conditions ifTTl . |[T8l . lfT9ll . 

The problem of sequential binary composite hypothesis testing with observation control was 
considered by Chernoff [20] and an asymptotically optimal sequential test was presented. While 
Wald's SPRT is optimal in the sense that it minimizes the expected values of the stopping time 
among all tests for which the probabilities of error do not exceed predefined thresholds ffT5l . a 
weaker notion of optimality is adopted in ||20) . Specifically, the proposed test is shown to achieve 
optimal expected values of the stopping time subject to the constraints of vanishing probabilities 
of error under each hypothesis. The sequential test with causal control proposed by Chernoff 
can only be proven to be asymptotically optimal under under a set of positivity constraints on 
the Kullback-Leibler distances as defined in ( TT31) . Bessler [21 J generalized Chernoff' s work to 
general multiple hypothesis testing but also imposed the same type of assumption on the modelQ 

Burnasev [|22| considered the problem of sequential discrimination of multiple hypotheses with 
control of observations under a different information structure. It is important to note that the 
controlled sensing problem that we consider is fundamentally different from Burnasev's problem. 
Unlike ([22|. where the control actions are functions of the underlying hypothesis, in ll20l and the 
setting we consider herein the control actions cannot be functions of the unknown hypothesis. 
In that sense, the problem considered in 1122) has a simpler structure since the controller knows 
the underlying hypothesis. This knowledge simplifies both the optimization of control policies 
as well as their performance analysis. When the hypothesis is unknown to the controller, as in 
the controlled sensing problem considered herein, the controller has to base its control actions 

'We would like to thank the anonymous reviewer for pointing us to the generalization of Chernoff's test to the M > 2 case 
in Bessler's dissertation, which we were unaware of at the time of initial submission of the manuscript. 
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on estimates of the unknown hypothesis. 

A Bayesian version of this sequential problem (with observation control) was considered by 
the authors in [|23l in the non-asymptotic regime. Since the optimal policy is generally difficult 
to characterize, certain conditions (Blackwell ordering [12410 were identified under which the 
optimal control is an open-loop control. The main focus of [1231 . Il25l has been on trying to 
solve the underlying dynamic program and finding the structure of optimal solutions, a task that 
is only possible in some special cases. In contrast, our work mostly focuses on performance 
analysis and on establishing asymptotic optimality of proposed control policies. 

In Section [iVl we extend the results in [|20l . [|2T| in several directions. First, we show that 
the sequential test in 11201 . ETI is asymptotically optimal in a strong sense, using the notion of 
frequentist risks instead of the probability of error. Second, we dispense with the assumption 
by using a modified test, thereby completing the achievability proof of asymptotic optimality 
from [|20l . lETI by successfully dropping this critical assumption. Third, we design another test 
to meet hard risk constraints while retaining asymptotic optimality. 

B. Paper Outline 

The remainder of the paper is organized as follows. In Section HH we specify the general 
notations and assumptions that will be adopted throughout the paper. Our problem formulations 
and results for the fixed sample size setting and the sequential setting, together with a summary 
of our contributions in each case, are given in Section [Til] and [IVl respectively; An example is 
provided in Section |V] A discussion is provided in Section |VT1 and conclusions are given in 
Section IVIIl All proofs are relegated to the appendices. 

II. Preliminaries 

Throughout the paper, random variables are denoted by capital letters and their realizations 
are denoted by the corresponding lower-case letters. 

Consider hypothesis testing with M hypotheses, with the set of hypotheses denoted by M. = 
{0, . . . , M — 1} . At each time step, the observation takes values in y and the control takes 
values in U . We assume that the control alphabet U is finite. The observation alphabet y is 
a measurable space; it can be either continuous, i.e., a finite-dimensional Euclidean space, or 
discrete. Under each hypothesis i E M, and at each time k, conditioning on the event that the 
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current control Uk has value u, the current observation Y k is assumed to be conditionally inde- 
pendent of past observations and past controls (y k ~ 1 , u k ~ 1 ) = {{yi, . . . , yu-i) , • • • , Uk-i)) ■ 
We refer to this (conditionally) memoryless assumption as the stationary Markovity assumption. 

The following technical assumptions are made throughout the paper. First, for every u eU, 
we assume that the distributions of the observations under each hypothesis i G M. are absolutely 
continuous with respect to a common distribution \i u on y. Consequently, for every u EU and 
every i G M, there exists a probability density function (pdf)/probability mass function (pmf) 
pf (depending on whether p, u is a continuous or discrete distribution, respectively) such that for 
every measurable set A C y, 

Ff{YeA} = [ pUv) dfiuiy) , ueU, (1) 

J A 

where the notation P" denotes the probability measure with respect to the distribution pf. Second, 
we also assume that for every u GW and every pair i,j G M, i ^ j, 

2 



log 



< oo, (2) 



where the notation E" denotes an expectation with respect to pf. Note that it follows from © 
that for every u EU and every pair i,j G M, i ^ j, p^ is absolutely continuous with respect 
to p U y However, for u, u' G U, u ^ u', and i,j G Ai, pf need not be absolutely continuous 
with respect to p^' . For a finite y, the combination of © and the first assumption is tantamount 
to the condition that all pmfs in the collection {Pi} ieM have the same support. However, the 
support could be different for different values of u. 

III. Fixed Sample Size Setting 

In this section, we first consider the setting wherein the sample size is fixed a priori, i.e., it 
does not depend on specific realizations of the observations and controls. 

We consider two classes of control policies based on two information patterns. The first is 
the open-loop control policy where the (possibly randomized) control sequence (Ui, . . . , U n ) is 
assumed to be independent of the observations (Y 1 , . . . , Y n ) . The second is the causal control 
policy where at each time k, the control Uk can be any (possibly randomized) function of past 
observations and past controls, i.e., £4, k = 2, 3, . . . , n, is described by an arbitrary conditional 
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pmf q k [uk\y k ~ l , u k ~ 1 } , and Ui is distributed according to a pmf q 1 (ui). If all these (conditional) 
pmfs are point-mass distributions, i.e., the current control is a deterministic function of past 
observations and past controls, then the resulting policy is a pure control policy. Under the 
aforementioned stationary Markovity assumption, the joint probability distribution function of 
(y n , U n ) under each hypothesis i, denoted by pi (y n , u n ) , can be written as 

n n 

Pi (y n ,u n ) 4 qi^Hpr (y k )l[q k {u k \y k - 1 ,u k - 1 ) . (3) 

k=l k=2 

For open-loop control, q k {uk\y k ~ l ,u k ~ 1 ) is (conditionally) independent of y k ~ l \ hence, 

P*(y n ,u n ) = (f[p^(y k )\ (qi(ni)f[q k (u k \u k - 1 )) = (f[p? k (y k ) \ q{u n ). (4) 

\fc=l / \ k=2 J \k=l J 

After n observations, a decision is made about the hypothesis according to the rule 
5 : y n x U n -> M with maximal error probability: e ({q k } n k=1 , {vtT^M > 6 ) ~ P ^ i S ± ■ 
Note that for a pure control policy, u n is either a fixed sequence (pure open-loop control) or a 
deterministic function of the observations y n (pure causal control). Consequently, when a pure 
control policy is adopted, it suffices to consider a decision rule that is a function only of the 
observations, i.e., 5(y n ,u n ) = 5 (y n ) . The combination of a control policy and a decision rule 
will be referred to as a test. The asymptotic quantities of interest will be the largest exponent of 
the maximal error probability achievable by open-loop control, denoted by j3 Qh , and by causal 
control, denoted by /3 C , respectively. In particular, 



^ol = lim sup --\og(e(q(u n ),{ P n^M^ 



13c = lim sup n — log (e » {Pi}ieM > 6 

5, <ji(«i), {qk(u k \y k - 1 ,u k - 1 ')} n 



fc=2 



It follows immediately from these definitions that /3 l < /3c , as the information pattern asso- 
ciated with causal control is more informative than that associated with open-loop control. We 
also seek to characterize the optimal control policies that achieve the optimal error exponents. 
Note that because the number of hypotheses is fixed, we can consider a Bayesian probability of 
error (with respect to any prior probability distribution of the hypothesis) instead of the maximal 
one in the definitions of the optimal error exponents without changing their optimal values. 
Before moving on to the technical part, we first summarize our contributions in this section. 
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• We derive a characterization for the optimal error exponent achievable by open-loop control 
for general multiple hypothesis testing with a fixed sample size (see footnote [3] explaining 
the connection between this result and previous work ffTTI ). 

• We propose a test for general multiple hypothesis testing with a fixed sample size using a 
causal control policy that chooses the control value based on a suitable Chemoff information. 
We also derive general lower and upper bounds for the optimal error exponent achievable by 
causal control that holds for any number of hypotheses, and illustrate through a canonical 
example with only three hypotheses that causal control can outperform open-loop control. 

A. The Case of Binary Hypothesis Testing (M = 2) 

For pi and p 2 that are pdfs/pmfs on y with respect to a common distribution A, the Kullback- 
Leibler (KL) distance of p\ and p 2 , denoted by D (pi\\p2) , is defined as 

DM!*) = [ pi(y)io g ^44W^)- 



We start with the following characterizations for the largest error exponents achievable by 
open-loop control and by causal control in the case of binary hypothesis testing. 
For any u eU and any s6 [0,1], consider the following pdf/pmf 

b u s (y) = ~ r ^}f pU ]}f~ S _ , and also let (5) 

Jy Po (y) Pi (y) dfi u (y) 



s (U 



= argmax - log ( / p% (y) 8 p\ {yf 3 d/j u (y) J . (6) 

sG[0,l] \Jy J 



Proposition 1: For M — 2, it holds thaj^l 

/?ol = /?c =max max -log I / p% (y) s 'Pi(yf~ s d^ u (y)\ (7) 
ueu se[o,i] \J y J 

= max D (b^ (u) \\p u ) = max D (&« (tt) \\p$ . (8) 

2 Although this result is mathematically equivalent to II 1 21 Theorem 1] on discrimination of two channels, we point out 
here that the term "discrimination" was first coined by Burnasev's in 1221 in the context of channel coding. In Burnasev's 
discrimination problem, he explicitly separated the roles of the controller and final decision maker which correspond to the 
encoder and decoder, respectively, in the channel coding problem. In particular, this correspondence led him to consider the 
discrimination model in which only the controller knows the hypothesis Later, Hayashi adopted this term in [12| with only two 
hypotheses, but dropped the assumption that the controller knows the hypothesis. This confusion regrettably caused us to miss 
Hayashi's work in our first conference submission 1131 . even though we had been fully aware of Burnasev's work at the time. 
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Remark 1: For each fixed «fW, the quantity 



c(pM) 



A 



«e[o,i] 



max — log 




(9) 



is called the "Chernoff information" of p% and p". Consequently, Proposition \T\ (cf. ©) states 
that the optimal error exponent is the maximum Chernoff information over the choice of controls. 

Remark 2: It follows from Proposition 1 and the result on the Chernoff information for i.i.d. 
observations that the above optimal error exponent is achievable by a pure stationary open- 
loop control sequence in which, for every k — 1, . . . , n, = u*, where u* is the maximizer 
associated with the right-side of © (or, identically, with the two (maximizing) optimization 
problems in dHJ). In particular, information from the past and randomization are superfluous for 
attaining the best error exponent for binary hypothesis testing with a fixed sample size. 

B. The Case of Multiple Hypothesis Testing (M > 2) 

1 ) Open-loop Control: Our first theorem pertains to the situation with open-loop control. 
Theorem 1: For M > 2, it holds thafl 



where the left-most maximization is over all pmfs q on U and the minimization is over all pairs 
of hypotheses Furthermore, /3 l is achievable by pure (non-randomized) control. 

2) Causal Control: A natural question that arises now is whether causal control can yield a 
larger error exponent than open-loop control when M > 2. The answer will be shown to be 
affirmative even for M = 3. To this end, we now propose a test with pure causal control (we 
show in Theorem |2] below that pure causal control does achieve the optimal error exponent). 

Our test admits the following recursive description and is based on the use of the posterior 
distribution of the hypothesis as a sufficient statistic. Having obtained the first k observations 



3 This result is complementary to [11 Theorem 5], First, our result is more general in that it applies to general observation 
alphabets not just the finite case (subject to the conditions stated in Section [TTJ> . This is because the proof of our result relies 
only on the weak martingale convergence result (and some basic calculus facts), which in turn can be derived from just Markov 
inequality. In contrast, the proof of [11 Theorem 5] relies on the machinery of the "method of types" [26], which depends 
critically on the finiteness of the observation alphabet. In addition, for finite observation alphabets, our formula <| 1 1 ) for /3ql 
is simpler than that in [11 Theorem 5] because it involves maximization over a single real-valued spurious parameter s instead 
of minimization over a conditional distribution as in Hill Theorem 5]. 



(3ol = max min max 

q(u) i^j SG[0,1] 



^Hiog pUy) s p](y) l ~ Sd ^(y) 



(10) 
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y k , we find the maximum likelihood (ML) estimate of the hypothesis, denoted by (y k ) = 
argmax ieA/! p i We adopt a pure control policy wherein u k+ i G U is selected as 




time n, the decision rule is specified as 8(y n ,u n ) = i n . The proposed test follows the celebrated 
separation principle between estimation and control; while estimating the ML hypothesis is 
carried out online, the control is chosen based on a stationary deterministic mapping from the 
space of posterior distributions to the control space, and hence, the mapping can be fully specified 
offline. It will be shown in Section |V] that for the special example with only three hypotheses, 
this proposed test is superior to the best open-loop control. In general, we still do not know 
the structure of the optimal causal control, and characterizing the optimal error exponent for 
causal control is a hard problem even for M = 3. Nevertheless, we derive precise bounds on the 
optimal error exponent that are applicable for any M > 2. Note that the optimal error exponent 
achievable by open-loop control as characterized in Theorem \T\ already serves as a lower bound 
for the optimal error exponent achievable by causal control. We also derive a new lower bound 
and an upper bound for the optimal error exponent for causal control. These bounds are stated in 
Theorem |2] for the fixed sample size setting with M > 2. Although the lower bound of Theorem 
|2] for (3c holds only for a finite observation alphabet y, the upper bound in Theorem |2] and all 
the previous results are valid for an arbitrary y (subject to assumptions © and © in Section 
HU). As mentioned in Section HH for a finite y, we assume that for every u eU, the collection 
of pmfs {Pi} i£M have the same support. 

For any pmf u on M., any u E U, let v o p u (•) denote the pmf/pdf (on y) £\ v {%) p" (y). 
Theorem 2: For every finite y and every M > 2, it holds that 




< /3c < min max max — lo ( 

ijtj u se[o,i] 




(12) 



4 In case of ties, we pick, say, the hypothesis with the least numerical value. 
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where the outer supremum for the argument of — log in the lower bound is over pmfs v on 
M. that are not point-mass distributions and the outer minimization for the upper bound in ([TIT ) 
is over all pairs of hypotheses i,j, i ^ j. Furthermore, as for (3 0L , the exponent j3 c is a ls° 
achievable by pure control without any randomization. 

Remark 3: Although the optimization for the lower bound in ([TIT ) can be quite difficult to 
solve, it can be handled off-line, i.e., it only depends on the model {pf, i E Ai, u E IA). In the 
example of Section |V] it is shown that the value of this lower bound is strictly larger than /3 l • 

IV. Sequential Setting 

In the previous section, we considered tests with a fixed sample size. In this section, we 
consider a different setting in which the controller can adaptively decide, based on the realiza- 
tions of past observations and past controls, whether to continue collecting new observations, 
thereby deferring making a final decision about the hypothesis until later time, or to stop taking 
observations and make the final decision. In this setting, the goal is to design a sequential test 
to achieve the optimal tradeoff between reliability, in terms of probability of error, and delay 
or cost, in terms of the expected sample size needed for decision making. Unlike in the fixed 
sample size setting in which the asymptotic analysis of tests with open-loop control is easier than 
that of tests with causal control, in the sequential setting, the contrary situation seems to hold. In 
particular, as we show below, the adoption of randomized causal control in the sequential setting 
enables the simultaneous minimization of the expected sample sizes under the M hypotheses as 
the error probability vanishes. In contrast, an analogous characterization for open-loop control 
remains elusive. By virtue of this fact, we only consider causal control in the sequential setting. 

We now summarize our contributions in the sequential setting. 

• The existing sequential test originally proposed by Chernoff ll20l for binary composite 
hypothesis testing, and extended to the multihypothesis setting by Bessler [|2T|. can only 
be proved to be asymptotically optimal under a certain assumption on the distributions 
( (fl"3T) below). We first show that under the same assumption this test, which we refer to as 
the Chernoff test, is asymptotically optimal in a strong sense, using the notion of decision 
making risk in place of the overall probability of error. 

• We dispense with the aforementioned assumption by using a modified version of the 
Chernoff test described in Appendix B.II, where we outline the achievability proof of 
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asymptotic optimality without (IT3T ) . 
• We design another test to meet hard risk constraints while retaining asymptotic optimality. 

Let Fk denote the a-field generated by (Y k ,U k ). A sequential test 7 = (4>,N,5) consists 
of a causal observation control policy 0, an J^-stopping time N representing the (random) 
number of observations before the final decision, and the decision rule 5 = 5 (Y N ,U N ) . 
Akin to the paragraph containing ©, the causal control policy (p is described by the pmfs 

QM , {q (wfc|y fe_ V fe_1 )}]jL 2 • 

A. The Chernoff Test 

We first present the Chernoff test [|20l . ETI for sequential design of experiments with multiple 
hypotheses. The proof of asymptotic optimality of this test requires the following technical 
assumption which was also imposed in EOl . EE]: For every u 6W, 0<i<j<M — 1, 

D(p?\\Pj) > 0. (13) 

The Chernoff test admits the following sequential description. Having fixed the control policy 
up to time k and obtained the first k observations and control values y k ,u k , if the controller 
decides to continue taking more observations, then at time k + 1, a randomized control policy 
is adopted wherein Uk+i G U is drawn from the following distribution 

q(u) = q(ui k ) = argmax min q (u) D (pf ) , (14) 

V J q(u) j€M\{i k } „ v J 

where i k = argmax igA/! p { (y k , u k ), is the ML estimate of the hypothesis at time k. The stopping 
rule is defined as the first time n for which 

[ px (y n , u n ) \ 

l0 S ~ / n n\ \ ~ - lQ g( C )' ( 15 ) 

^max pj (y n , u n ) J 

where c is a positive real-vallued parameter that will be selected to approach zero in order to 
drive the probabilities of error to zero. At the stopping time n, the decision rule is ML, i.e., 
8(y n ,u n ) = i n . Note that randomization is used in the causal control policy. This facilities 
the simultaneous minimization of the expected stopping time under the M hypotheses as the 
error probability goes to zero. Also similar to the test proposed in the sample size setting, 
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this sequential test relies on the separation principle between estimation and control, with the 
distinction that the stationary mapping from the posterior distribution of the hypothesis to the 
control value is now randomized. 

To dispense with (fl"3l) . we propose a "modified Chernoff test" with a control policy that is 
slightly different from (fl4T ). Specifically, instead of using the policy (fl4l) at all times, we will 
occasionally sample from the uniform control independently of the index of the ML hypothesis; 
the specific way in which this is done will be explained in Appendix B.II. The stopping rule of 
this modify test will still be as in (fl"5l) with the same c therein. 

B. Asymptotic Optimality 

In order to present a formal statement establishing the strong asymptotic optimality of the 
Chernoff test, we introduce the concept of decision risks or frequentist error probabilities [18]. 
In particular, let % G Ai, be a prior distribution of the hypothesis with a full support. For 
each i G M., the probability of incorrectly deciding i or the risk of deciding i is given by 




(16) 



j€M\{i} 



Note that for each i 6 M 




(17) 



j£M\{i} 



Therefore, the condition max P fc {5 ^ k} — > implies that m&xR k — > 0. 

keM keM 



Theorem 3: The modified Chernoff test (as c — > 0) satisfies 



lim max P; \5(Y N ,U 







(18) 



and for each i G A4, 




(19) 




(20) 
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Furthermore, the modified Chernoff test is asymptotically optimal in the following strong sense. 
If the prior it has full support on M., then any sequence of tests with vanishing maximal risk, 
i.e, max.R fc — > 0, satisfies for every i 6 Ai, 

E.IM > -^L^(l+o(l)). (21) 

max mm q{u)D{p\ f\\Pj) V / 

q{u) j&M\{i} u 

Remark 4: The converse assertion (1211 in terms of maximal risk implies the one in terms of 
the maximal error probability, but not vice versa. Thus the asymptotic optimality of the modified 
Chernoff test established in Theorem |3] is stronger than the corresponding result in lEOl . [1271 . 
which is given in terms of maximal error probability. 

C. Asymptotically Optimal Test Meeting Hard Constraints on the Risks 

Although the calculation of risks involves the prior distribution of the hypothesis, the test 
proposed in Section IIV-AI does not use the knowledge of the prior distribution at all. In this 
section, we show that by using this knowledge, we can further modify our test to meet hard 
constraints on the risks. Another key to this new test is the use of different thresholds for the 
peak of the posterior distribution depending on the index of the ML hypothesis instead of a 
single threshold as in (fl"5T) . In the asymptotic regime in which all the risks vanish, we show that 
this modified test will also be asymptotically optimal. 

Specifically, for a given tuple (R±, . . . , Rm) , we will design a test to satisfy Ri < Ri, i E M. 
To this end, we modify the stopping rule (TT3T > to be so that we stop at the first time n when 



tt(V) Pijy n ,u n ) \ ({M 

max 7r [j) pj (y n , u n 




(22) 



Theorem 4: For any tuple . . . , R M ) ,Ri>0, i G M and any n with a full support, the 
modified Chernoff test but with the stopping rule (1221 in place of (fT5l) satisfies, for every i e A4, 

J2<j)V J {5{Y N ,U N )= i ] < R\. (23) 
Furthermore, as max_Rj — > 0, while satisfying max.Rj < K I min^j ) for some constant K, 
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the proposed test is asymptotically optimal, i.e., it satisfies (fT9l) and, hence, also (1201) . 

V. Example 

Example 1: We consider an example with parameters M — 3, y — {0, 1} , U = {a,b, c} . 
For an arbitrary e, < e < ~, denote by p(y) and the two pmfs on y for which 

p(l) = e and p(l) = 1 — e, respectively. Then, consider the model for controlled sensing for 
hypothesis testing in which the pmfs p", i E {0, 1,2}, u E {a, b, c} , are assigned according 
to Table 1. 





u — a 


u — b 


u — c 


i = 


P 


P 


p 


% = 1 


P 


P 


p 


i = 2 


P 


P 


p 



Table 1: Example 

This example is motivated by adaptive sensor selection for event detection. Consider a sensor 
network with a fusion center and three sensors a, b and c, collecting measurements from three 
separate locations 0, 1, 2. A specific event takes place at exactly one unknown location; it affects 
the distribution of the measurements at this particular location (represented by the distribution 
p in Table 1), while the measurements at the other two locations are distributed according to p. 
At every time step, the fusion center can query only one sensor to measure its readings. The 
goal is to determine the location of the event in the most efficient manner. 

The optimal exponent for open-loop control (cf. (fTOT )) can be easily calculated to be 

/?ol = \c{p,p) = -hog (2^6 (1-e)). (24) 

For causal control, we apply the control policy presented in Section IIII-B2I (cf. (ITT|) ). Then, 
by solving the maximization in (fTTI) . we obtain a deterministic causal control policy, which is 
given by u k+1 = f (i k j , where / (0) = a, f (1) = b, f (2) = c. Lastly, at time n, the decision 
is made for the maximum likelihood estimate, i.e., § (y n ) = i n . We now analyze the maximal 
error probability of this test. To this end, for any y n , we let 

k a = | {k E {1, . . . ,n} : u k = a and y k = 1, or u k ^ a and y k = 0}\. (25) 

Then, we get from Table 1 that p (y n ) = e ka (1 — e )"~ fea . Similarly, we can define k b and k c 
with a in (|25l) replaced by b and c, respectively, and get that pi (y n ) = e kb (1 — e) n ~ kb , p 2 (y n ) = 
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We sort {fa, fa, fa} in an ascending order and denote the sorted values by fa < k 2 < fa. 
Note that at every time step, the most likely hypothesis is the one associated with fa. Then, 
it follows from Table 1 that as n increases by one, if y n = 1, then the least of {fa, fa, k c } 
increases by one, while the other two remain fixed. On the other hand, if y n = 0, then the least 
of {k a , fa, fa} remains fixed, while the other two increase by one. Hence, If we let k denote 
the number of zeros in y n , then fca+ ^ ft+fcc = h±^. In addition, starting from no observation 
at time zero when {k a , fa, k c } are all equal to zero, we get from an induction argument that, 
fa < fa < k 2 + 1. This argument is similar to that in [pp. 54][10]; we refer the reader to [|T0l 
for further details. We can now conclude from these previous identities that 

fa + fa + fa 1 n + k 1 

fa > = . (26) 

2 - 3 3 3 3 V ' 

At time n, 5 (y n ) corresponds to the smallest fa; it follows from (126T ) that for any i — 0, 1, 2, 

Vi{5?i} < ^ e W)(i_ e) "-W) 

y n 

\n-k 2 (y n ) 



y: e eW) c 1 - e ) 

w = l yn. y k= Q^— w 



IT, \ (n+m) 1 (2n— w) . 1 



I 2^ I Je— — 3 (1 _ e ) 

kW=0 



eTi (1 - e )3 + ef (1 - e)« 



j _ 1 

w I I es (1 - e) 3 



and we get that 







lim — 


ilo. 




n 


(EU) to 





=0,1,2 



(27) 



error exponent than the best open-loop control. By the symmetry in Table 1, the upper bound 
for /3c in (fT2)) can be calculated to be C {p, p) = — log (^2y^e (1 — e)j . 

Lastly, we show that our lower bound for /3 C in (fl"2l) gives the same achievable error exponent 
in (|27T) for this example. To this end, we consider the argument of the — log in the lower bound 

. n[vo P u (y)-pf(y)] \ 

supminmax | 2_.Pi (u) e J . (28) 
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Note that the argument minimizer u in (|28l) is a function of za Hence, if the minimizer is replaced 
by a specific function u — f (u), then we will get a larger quantity, i.e., 



sup mm max 

„ u i 



2_^vt{y)a (1 ~" W) I < sup max I 2_^Vi (v)e "I. (29) 

In particular, consider the following function 

{a, argmax^ zv (i) = 0, 
6, aigmaxj i/ (i) = 1, (30) 
c, argmaXj v (i) = 2. 

For an arbitrary v {%) , i = 0, 1, 2, denote their respective sorted values by v u > v c > ug. Then, 
it follows from (|30l via appropriate algebraic manipulations using Table 1 that 

/ (l_ e ) e ->7(l-2e) +e e ^(l-2 e ) j \ 



r ; (^op/(")( !/ )-p{^'( !/ )) 

sup max I ^^pf (y) e u-mo) " J = sup max 



1>U U >V >V£ 



-r](l-2c)u u r)(l — 2c)v u 

(1 - e) e t 1 -^) +e e , 

— ?)(! — 2g)i/ M 77(1 — 2e)i/n 

(1 _ e ) e C 1 -"*) + e e C 1 "^) J 



(3D 



21og(^ 



Next, we select 77 = — 3 ( 1 _ 2 e) • Note that for any v u > v c > v#, 

- < — ; — ^ < — ; — ^ < -• (32) 
3 - 3{l-u e ) ~ 3(1- v c ) ~ 3 

It then follows from the selection of 77, (1321) . and the fact that for any < e < \, 

max (1 - ef~ s e s + (1 - e) s e 1_s = (1 - e)* es + (1 - e)* e§ , 
that for any z/ u > z/ c > z/g, 

max ((1 - e) e" 7 + e e 7 , (1 - e) e" 7c + e e 7c , (1 - e) e" 7 ' + e e 7i ) = (1 - e)* es + (1 - e)s ef , 

21og(^) 2 log(ii^l)^ 21og(ii^i)^ 

where 7 = v 3 , 7 C = — ; 7« = — 3(1-^) — ■ F 01 l° win g f rom d29J) and (1311) by 

taking — log , we get that 

supmaxQ (y) e (!-"«) ) I = - log (Jl - e) 3 e 3 + (1 - e) 3 e 

as required. This lower bound matches the lower bound in (|2Vj . 
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In the sequential setting, the quantities dictating the asymptotically optimal performance are 
max min Y] q(u)D (pY\\p¥) , the denominators on the right-side of (fl~9]>, which can readily 

q(u) jEM\{i} u 

be computed for this example to be — log (2s/e (1 — e)J for every % e M. The numerical value 
of this quantity is, as expected, larger than (3c in the fixed sample size setting, as now the control 
has an additional capability to adaptively stop taking observations based on past observations. 

VI. Discussion 

In the proposed sequential test in Section [IV] information from the past is used to form the 
maximum likelihood estimate of the hypothesis, which is used in turn to select the maximizing 
distribution and the maximizing control value in (PT4T) . In contrast to binary hypothesis testing with 
a fixed sample size (cf. Proposition [D, information from the past seems to be crucial for attaining 
the asymptotically optimal performance in the sequential setting, since the mentioned maximizers 
can depend on the identity of the ML hypothesis even for the case of binary hypothesis testing. 

VII. Conclusions 

We studied the structure of the optimal controller for multihypothesis testing with observation 
control under various asymptotic regimes. First, in a setting with a fixed sample size, the optimal 
error exponent corresponds to the maximum Chernoff information over the choice of controls 
for binary hypothesis testing. In particular, in this setup, a pure stationary open-loop control 
policy is asymptotically optimal even among the broader class of causal control policies. For 
multiple hypothesis testing, we characterized the optimal error exponent achievable by open-loop 
control and derived precise lower and upper bounds for the optimal error exponent achievable 
by causal control. We also proposed a causal control policy for multihypothesis testing based on 
maximizing the minimum Chernoff information of the distributions corresponding to the most 
likely hypothesis and all the alternative hypotheses. We illustrated through an example that the 
proposed causal control policy strictly outperforms the best open-loop control policy. 

Second, we considered a sequential setting wherein the objective is to minimize the expected 
stopping time subject to the constraints of vanishing error probabilities under each hypothesis. 
We proposed a suitably modified version of the Chernoff test for multiple hypotheses testing and 
showed that it is asymptotically optimal in a strong sense, using the notion of decision making 
risk instead of the overall probability of error. Our control policy is based on maximizing the 
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KL distance of the distributions corresponding to the most likely hypothesis and the nearest 
alternative hypothesis. We also designed another sequential test to meet hard constraints on the 
risks while retaining the asymptotic optimality. 

For binary hypothesis testing, the findings showed that past information is crucial in achieving 
the asymptotically optimal performance in the sequential setting, while it is superfluous in 
the fixed sample size setting. Our results also showed that for general multiple hypothesis 
testing, randomization in control is always superfluous (for any number of hypotheses) in 
achieving the asymptotically optimal performance in the fixed sample size setting. On the other 
hand, we showed that in the sequential setting, randomization can facilitate the structure of the 
asymptotically optimal control policy following the separation principle between estimation and 
control especially in the sequential setting. 

In our analysis we inherently assumed that the control actions were equally costly. We intend 
to study extensions to the case of non-uniform costs for the control actions in future work. It 
is also of interest to study the controlled sensing problem with incomplete knowledge of the 
probabilistic observation model. Another avenue for ongoing research seeks to explore whether 
the two-pronged approach of combining tools from stochastic control and information theory can 
be extended to other controlled sensing problems such as quickest change detection, parameter 
estimation, and learning-based classification. 

Appendix A. Proof of Results in Section [nT] 
/. Proof of Theorem [7] 
We start with the achievability proof. First, note that for any n, and any test 

ieM \ ieM J 

Fix a sequence u n E U n , and let <5 ML : y n — > Ai be the ML decision rule. It now follows that 

^^pawn^o = E ^{SuL(r n )=j}- (a.2) 

i£M i&M j£M\{i} 

For any i, j, 0<i<j<M — 1, and any s E [0, 1], we get that 

P, {5ml (Y n ) = j} < P, { Pi (y n ) s Pj {y n f- s > Pi (y n )} and (A.3) 
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Pi {^ml (Y n ) = i} < Pj { Pl (y n Y Pj (y n r s > Pj (y n ) } ■ (A.4) 
Combining (IA.3I) and (IA.4I) . we obtain that 

„ n n 

^{SML(Y n )=J} + ^{^ML(Y n ) = ^} < / ll(p? (to)' P? (VkY^U^^ 

J y" fe=i fc=i 



HI / PT (VkY P u j k ijJkf s dfi Uk (y k ] 
fc=i 

n( E ?(«)log(/ tf py(l/) J I'?(l') 1 "*^(v)) 



(A.5) 



where g (•) denotes the empirical distribution of u n : q (u) = ^|{&: & G {1, . . . , n} , = u} 
Since (IA.5I) is true for any s E [0, 1], we get that 



-n max - E log (/„ P?(y) 3 p? (y) 1 ""^^) 

Pi{W^)=i} + PH<W*™)=*} < e V-m — 

Because there are only finitely many pairs of hypotheses in the sum on the right-side of (1A.2K 
the pair corresponding to the smallest exponent will dominate the exponent. Hence, we get 

i x — . — nlmin max — EqMIokI f pY(y) s p , ?(v) 1 ~ s dpu(y)) I 

Since -u n is arbitrary, we can approximate any distribution q{u) arbitrarily close by the empirical 
distribution q^ (u) of an appropriate deterministic sequence u n such that max u \~cf n \u) — q{u) \ — > 
0. This fact combined with (IA.1I) yields that 

/3 OL > max min max - V log ( / (?/) s (y) ] , (A.6) 

q{u) K] s6[0,l] ^ \Jy J 

and that the error exponent on the right-side of (1A.6t is achievable by pure open-loop control. 

Next, we prove that the reverse inequality of (IA.61 ). Since we proved that /3 G l is achievable by 
pure control, we restrict our attention to pure open-loop control. By considering the necessary 
and sufficient condition for the maximizing s of the function 



- X^ log ( / ( yk "> Sp T ( yk ^ " dflu * ^ ) ' 
k=i \ J yk ' 
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we obtain for any u n G U n , and any pair of hypotheses i,jeM that 



argmax 

se[o,i] 



J^log ( / (VkYpf (Vk) 1 s dfi Uk J 

fe=l ^ffc ' 



(A.7) 



satisfies (cf.©) 



max — 

se[o,i] 



E lo § ( / (w)*Pi* (wo 1-5 ^ (wo) = E D = 5> (C" 

fc=l ^^ft ' fe=l fc=l 



(A.8) 



We next consider, for the same pair of hypotheses i,j as above, the pdf/pmf p defined by 
P(y n ) — rifc=i bij' s (yk) • F° r an Y test > ^ either holds that 



F{5{Y n )=i} > - or that F{8(Y n )^i} > -. 

2 2 



(A.9) 



Suppose that the first case of (IA.9I) holds. For any causal control policy, under the stationary 
Markovity assumption and assumption (0, it follows that the random process Sk, k — 1, . . . , n, 



where 



P? (Yi) 



E 



log 



pT (Y) 



Ti- 



(A.10) 



is a "stable" martingale adapted to T k , the sigma fields generated by (Y h , U k ) , k = 1, . . . ,n. 
By the martingale stability theorem of Loeve [|27l pp. 53], we get that {^S^} 00 ^ converges to 
zero a.s. and, hence, in probability, i.e., for any 77 > 0, 



lim P J - V log 

n— too n \ 



LUfc,S 



k=l 



P? (Y k ) 



E 



log 



'i >:r'(Y 

P? (Y k ) 



j~k~i 



0. (A.ll) 



Since Uk, k — 1, . . . , n are fixed (pure open-loop control policy), we obtain from (\A. Ill) that 



lim P i - V ( log 



k=l 



pf (Y k 



D[b^ s \\pf) > V 



0. 



(A.12) 



The first inequality of (1A.91 ) and (1A.121 ) yield that for any e' > 0, any 77 > and all n large, 



- - e' < P <( 5 (Y n ) 



\[p?(Y 



k) > e 



-MEM D M" ||p«* 



k=l 



k=l 



<P,{<5(y n )^j}e V*=i" 1 " 



(A.13) 
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If the second case of (IA.9I ) holds instead, then similar to (1A. lib we obtain that 



From the second case of (1A.9I ) and (1A. 14b . we obtain that for any e' > and any 77 > 0, 



which parallels (IA.13I) . It now follows from (IA.13I) . (IA. 151 ) and (IA.8I) that for any i,jeM, 

lim -- log (max P 4 {5 ^ i} J < lim -- log (max (P; {5 (T n ) 7^ z} , P.- {5 (Y n ) ^ j})) 
n->oo n yieM J n-»oo 77, 

/ 1 n 1 " 

^ max U 5> n^ fc ) ^ £ D Or* M k ] 



k=l k=l 



- V - log ( / p* k (y k ) s * p] k (y k ) 1 s * dfi Uk (y k )) 

- g(«) log (7# (y) s * pJ (y) 1 ^ ^ (»)) 

max - ^ g(«) log ( fpt (y) s p] (y) 1 - 3 fi u (y)) , (A. 16) 



where q denotes the empirical distribution of u n . Since (IA.16I) must hold for every pair i,j of 
hypotheses, we then obtain that 



lim — — log (max Pj {5 (Y n ) ^ i} J < min max — g(w) log ( / p% (y) s p u , (y) 1 s dfx u 

n->oo n \ieM J i<j se[0,l] ' 



(v) 



and, hence, 



max min max — Yg^logf Pi {y) s p*i {y) 1 s dfi u ). (A. 17) 

q(u) i<j \J y J 



Note that in (IA.16I) . the empirical distribution q(u) depends only on the pure control u n and not 
on the pair of hypotheses i,j, while the maximizer s* in (IA.7I) depends both on w n and on the 
pair of hypotheses. The assertion of Theorem 2 is now proved by combining (IA.6I ) and ( IA.17I) . 

//. Proof of Theorem [2] 
We first prove that /3 C is achievable by a pure control policy. For any fixed n, the problem of 
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finding the optimal causal control that minimizes the exact average probability of error can be cast 
as a finite-horizon stochastic optimal control problem through the use of the posterior distribution 
as a sufficient statistic. Since U is finite, it follows from a standard dynamic programming 
argument [|28l that the optimal causal control is a deterministic one. 

Next, we prove the upper bound for (5 C in (fl"2l) . Observe that for any test for M hypotheses, 
with a decision rule 5 and any pair of hypotheses i,jeM,a binary test for hypotheses i and 
j, can be constructed using the same control policy and an appropriate decision rule 8 so that 

max(Fi{5(Y n ,U n ) ^ z} , (F n , f/ n ) ^ j}) < max P< {5 (Y n , U n ) ^ i} . 

Applying the converse part of Theorem 1 with the roles of {Po} u£U and {Pi} u€U therein being 
played by {p"} ugW and {p^j} uell i respectively, we obtain that 

p c < max max - log I / p u { (x) s p] (y) 1 ' 3 dfi u (y) I . 

u&A s6 [0,l] \J y J 

As the previous argument applies for any % ^ j, i, j £ Ai, we obtain the upper bound in (fl"2T ) 
by minimizing over all pairs of hypotheses i,j £ Ai. 

It is then only left to prove the lower bound for (3 C in (fl"2)) . The proof relies on the following 
lemma whose proof is deferred to Appendix A.III. 

Lemma 1: Let J = \y\. For every e, < e < 1, and rj > 0, it holds that 



sup mm max 



£p?(y) 



/ \ 

l + e(Jpt (y)-l) 



(A. 18) 



/ 



where the outer supremum on the left-side of (1A. 1 8t is over the set of all pmfs on M. that are 
not point-mass distributions. 

By L'Hopital's rule, for every v that is not a point-mass distribution, 



lim 

e->0 



/ \ 

l + e(Jpf (y)-l) 



Jc 



(i-i/(<)) 



-»■ e 



»7(^°p"(i/)-p"(i/)) 

e . (A. 19) 



/ 



Consequently, by letting e — > 0, we get from Lemma Q] and (IA.191 ) through the finiteness of 
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M, U, y that for any 77 > 0, 

— log I sup min max I Pi (y) e 11 < fie ■ 

The proof of Theorem 3 now follows by optimizing over i] > 0. 
Proof of Lemma [7] 

We shall consider a test based on a mismatched posterior distribution on the hypothesis. In 
particular, the control value at every time is picked based on the posterior distribution on M. 
computed based on an appropriately chosen mismatched model {<?"}"gj^ (instead of the real 
model and the uniform prior distribution on M.. In particular, denote the posterior 

probability of hypothesis % G M. at time k — 0, . . . , n, by v k (i). Then, 

x lj Qi (yi) 

voit) = -f, Uk(i) = — , l<k<n. (A.20) 

M A ui ( y ) ( \ 

j 1=1 

Also denote the likelihood ratio for hypothesis i G M at time k — 0, . . . , n, by l k (i), i.e., 

a v k (i) v k (i) 



k (i) 



1 - Vk 0) E v k (j) 



1 1J (i\ n uk+1 ( uk ( yk )) < ql \ 

Z (i) = -J_, i fc+1 (i ) = , 0<*<n-l. (A.21) 

The decision rule at time n is the maximum likelihood estimate of the hypothesis, i.e., 5 (y n ) = 
argmax,- v n {%) . Next, we analyze the probability of error of such test as a function of {qf} , 
i G M, u G U, and the pure control u k = u k (v k ^i) = u k [y k ~ l ) which will be specified 
later. We get that for any A < 0, the probability of error (with respect to the real model 
{p™} , i G M, u G W) under hypothesis i can be upper bounded as 

Pi {6 ± 1} = Pi jargmax U n (j) + ij < P. t {L n (2) < 1} < E, [l„ (*) a ] . (A.22) 
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Next, by writing 



(A.23) 



and substituting (IA.23I) into (IA.22I) . we get that for any A < 0, 



(M - 1)~ A . 



(A.24) 



We next specify the mismatched model {q^} , i £ M., u ElA. For any e, < e < 1, consider 
the conditional pmf W e (y\y') , y,y' £ J, such that 



(A.25) 



Then, let 



qUv) = P>W e (y) 4 ^^(y')W e (y|y') 



P? (2/) ( J 



1 , (J -1)6 



J 



i + j ( J P r (y) - 1) • (A.26) 



Using this particular {g"} , i G M, u E U, with J^-i denoting the sigma field generated by 
y fc_1 , k = 1, ... ,77., we get from (1A.21I ) through an easy algebraic manipulation that 



E; 



XI pi k (y) 



^(1 - i/fe-i (y) 



E "fc-i (i) (y) 



/ 



l + e[J P ? [y)-l] 



i + 6 ( E (i - ^-i W)" 1 J^-i (j')p? (y) i 



where w fc = w fc {vk-i) — u k (y k Next, let A = — for an arbitrary 77 > 0, and let 



/ 



u IV 



argmin max Pi (y) 

u 1 v 



l + e(Jpf (sO-1) 



n 

Jt 



. (A.27) 
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If we select the control to be u k = u* (fk-i), where u* (u) is as in (IA.27I ), then we get that for 
any i G M, k — 2, . . . ,n, and any realization of z/ fc _x (as a function of 



E, 



'( L k (i) V* 




U*-i(«')y 





mm max 



/ 



i + e(Jp« (y)-i) 



3& 



Note that since v (uniform) and all qf, i G Ai, u G U, have full supports (cf. (IA.26I ) upon 
noting that e < 1), it follows that for every k = 1, . . . ,n, and every realization y k , (y k ) 
will have a full support. With this observation, continuing from (IA.24I ) by using the smoothing 
property of conditional expectation, we get that 



e ^ c <( max Pj {5 ^ i} )< sup min max ) (y) 

\ieM J „ u i ^— ' 



/ \ 

l + e(Jtf(y)-l) 



[7 

,7c 



The lemma follows by taking the limit as n — > oo. 



V 



1 + el ^. (i-i'W) ^ y 



x(M - 



Appendix B. Proofs of Results in Section [TV] 
7. The Converse Proof of Theorem \3\ 
We now prove the assertion (12TT >. To simplify notation let 

Pi(Y n ,U r ' 



d* = max min > q(u)D(pf\\p%), and 
«(«) jeM\{i} ^ w 1 J/ 



log- 



(B.l) 



Pj(Y n , U n ) 

It is not hard to see that (|2T|) follows immediately from Lemma [2] below and Markov inequality. 



Lemma 2: For every < p < 1, any sequence of tests with vanishing maximal risk i.e., 
max Rk 0, satisfies 

k€M 

PijiV > -»■ 1, for every i G M. 

Lemma [2] in turn relies on the following lemma. 

Lemma 3: For any sequence of tests with max Rk — » 0, any < p < 1, it holds that 

¥i{Zij{N) > -plogRi] ->• 1, for each j G AL (B.2) 
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Proof: Define the subset Q n of the sample space as 

Q n = {(y n } u n ) : Z i3 (n) < plogRi, 5 = t,N = n} 
From the definition of Ri in (TT6l) . for every j e M. \{i}, we have the following set of inequalities 

j-, oo oo 

> Pi{«y = i} > £>i{Qn} > i?f^P,{Q„}, (B.3) 



77(7) 

yj ' n=l n=l 



where the third inequality follows from the fact that Zij(n) < — p log Ri on Q n , Hence, for every 

i^j, i,j e M, 

00 



7r(7') 



Thus, 



00 „ 
P i {%(iV)<-plog J R J } < y)P i {Q„} + P i {<J^i} < + V -V (B.5) 

z — ' 7T( 7 ) z — ' 7T( ? ) 

n=l uy jeA4\{ l } v 7 

The second inequality above follows from (IB .41 ) and from the fact that P$(5 = j) < ^y. The 
right-side of (IB.5I) goes to since i?j — > 0, for each i E M. This proves Lemma |3] ■ 
The following result follows from a standard martingale convergence argument as in Lemma 5 
in ll20l and is omitted due to space constraints. 

For any < p < 1, it holds that 

lim Pj{ max min %(m) > n(d* + 1 - p)} ->■ =0. (B.6) 

n— >oo l<m<n je.M\{i} 

Combining the result in Lemma |3] and (|B.6I) . we get for every < p < 1, 

Pi (iV < -» 0, (B.7) 



(',* + 1-/5 

which is equivalent to the assertion of Lemma |2] 

//. The Achievability Proof of Theorem \3\ without Condition < U~3\) 

Because the instantaneous control picked in (fT4l) is a function only of the identity of the 
ML estimate of the hypothesis and not of the reliability of the estimate, e.g., the value of the 
posterior probability of the ML hypothesis, when the ML estimate is incorrect, the instantaneous 
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control in ([141) can be quite bad. This can happen with large probability especially when only a 
few observations are collected. Condition (|T~3T > essentially ensures that when the ML hypothesis 
is incorrect, the control value of ([T4l will not be too bad. Consequently, (fT3b leads to a fast 
convergence of the ML estimate of the hypothesis to the true one when the ML estimation is 
used together with the control policy (fT4l at all times. Without (fT3l) . the convergence may not 
happen or even if it does, it may not be fast enough. This phenomenon is analogous to and 
is tightly connected to another one, which occurs in a somewhat more exacerbated form, in 
stochastic adaptive control ||29l illustrating the failure of ML identification in closed-loop li30l . 

As previously mentioned at the end of Section IIV-AI we slightly modify the control policy 
(TT4)) by occasionally sampling from the uniform control independently of the identity of the ML 
hypothesis; this sparse sampling is used to guard against the event of incorrect ML estimation 
of the hypothesis. Precisely, for some a > 1, at times k = \a l ~\ , I = 0, 1, . . . , we let Uk+i be 
uniformly distributed on U. At all other times, we still follow the control policy in (fl4)) . The 
stopping rule is still as in (fl"5T ). and the final decision is still ML. Without loss of generality, we 
can assume that for every i ^ j, i,jeM, there exists a«6W for which 

D(p?\\p?) > 0, (B.8) 

otherwise, the probability of error can never be driven to zero. It now follows from (IB.8I ) and 
the argument as in the proof of [|20l Lemma 1] that for every i ^ j, and all n sufficiently large 

6 l£gn 



. k=l 



where Lk = log ^ %,|^ fc j J , for some b > 0, as we can only guarantee that E, 
for times in n time slots (precisely at those times when the control value is forced to be 
uniformly distributed). Let T be the earliest time such that the ML estimate of the hypothesis 
equals the true hypothesis for all time k >T. Then, we get that for all sufficiently large k, 

Fi{T>k} < Mj2 e ~ b ^ < 0(£T 7 ) (B.9) 

t>k 

for an arbitrary large 7 when a is chosen to be sufficiently close to 1. Note that it was shown 
in J20l Lemma 1] that if £[3]) holds, then Pj {T > k} decays exponentially. 



e 2 J 



k-i 



< 1 
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Our achievability proof of asymptotic optimality without (fl"3l) , i.e., that the modified test 
satisfies (fT9l) without imposing (TT3T) . follow closely the steps in the proof of [1201 Lemma 2] 
under assumption (fT3T > . Due to space limitations, we shall just emphasize key steps and point 
out the difference from the proof when (fl~3) is relaxed. To this end, we denote the maximizers 
in the denominator on the right-side of ([T9l by q*(u). 

Referring to the stopping rule in (TT3T >. we see that the stopping time depends on the time needed 
for the Log-Likelihood Ratio (LLR) corresponding to the closest alternative hypothesis to cross 
the stopping threshold — logc. Thus, the main idea is to show that the LLR per observation 
concentrates around the denominator on the right-side of ([T9l for the control policy described 
above. The key step in the proof of ([T9l deals with the following decomposition for an arbitrary 
hypothesis j ^ i, where i is the true hypothesis, 

j n 1 n I n f 

- E L * = - E { L >< - E * M-^-J } + - E E < M-^-J - E «? («) D WW) 

k=l k=l k=l I u 

+ E?r (B.10) 

The proof of the measure concentration then boils down to proving that the two averages of 
the bracketed {}-quantities concentrate around from the negative side with a sufficiently quick 
decay of the probability of non-concentration. In particular, it suffices to prove that the following 
two sequences of probabilities (as a function of n) go to zero sufficiently fast: 



I k=l 



and 

Pi 



PJ-EI^-^^I^J} < -el (B.ll) 



iEl^^i^j-E^M^^ibi)} < -4 < B - 12) 

. 71 fe=l l u J J 



Note that the minimum value of the third term in the decomposition in (IB. 101) over j ^ i is 
specifically the denominator on the right-side of (fT9l ). 

The same argument leading to ll20l Equation (5.10)] gives that (IB. Ill) goes to zero exponen- 
tially. Also, (IB .91) implies a polynomial decay of (IB. 121) . as with probability 1, 



E<kN-^-i] E^^MIp;) 



k=l 



< C' min(T, n) + C" log n, 
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for some constants C, C" by virtue of fact that q = q* for each k > T, such that k ^ \a l ~\ , I > 1 
(cf. the definition of T in above). This will lead us to GUI Equation (5.9)] but only with a 
polynomial decay (with an arbitrarily high degree 7 in (IB.9I) ) in the probability on the right-side 
of the equation. Nevertheless, the sufficiently quick polynomial decay in the probability therein 
still enables us to complete the steps at the beginning to proof of [20, Lemma 2] to eventually 
upper bound the asymptotes of the expected sample sizes to be (fT9~l) . 

Proof of Theorem |4] 
We first prove d23T > . Let % be the true hypothesis. For any j E M,j ^ i, consider the event 

Am = {(V n , uH ) :N A = n,6 = j}. (B.13) 
Following the approach in [31], on the set A n j we have the following set of inequalities, 

log > ,og ( ^?f:: n \] > log (i^m) . (B.i4, 

The last inequality above follows since the test ends at n and the stopping criteria must be met 
for the choice of the thresholds in (l22l . Thus, 

R ■ 

11 " J/ - (M-l)7r(i) n J/ 

It now follows that 

00 5 00 5 

Pi{S = j} = g Pj{ ^, } < _i_g Pj{ ^, } < (B ,5) 

From the definition of i?j in (TT6l) . we then get that < The result holds for each j E M 
The last assertion of Theorem @]pertaining to asymptotic optimality of the proposed test follows 
by considering yet another test with the stopping rule (|22l being replaced by the following 
stopping rule with a single threshold 

£ '-(^tef ))■ 

and with the same control and decision rule as those of the proposed test. It follows from (|22l) 
and (|B.16I) that the stopping time of this new test will always dominate (larger than) that of the 
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proposed test a.s. Let us denote the two respective stopping times by iV and N'. Since n has a 
full support, as max^ — > 0, the single threshold on the right-side of (|B.16I) will go to infinity. 
By Theorem [3j this new test with the single threshold is asymptotically optimal, i.e., it satisfies, 
for every % G A4, 

Ej \N'} 1 

lim — < — - — . . , (B.17) 

maxi?i->-o logc max min J2 Q{u)D [pf \\pf) 

where c = jj—^ fniin On the other hand, it follows from (IB .15b and the assumption in 

the statement of Theorem [4] that 

R 

maxP;{5^z} < max— L < K'c, (B.18) 

i ift TT{i) 

for a suitable constant K' . The aforementioned dominance, i.e., N < N' a.s., and (IB. 1 8b along 
with (IB. 1 VI) give that for every i £ M, 

lim . < lim -W± < 1 



innx 



Ri^o log ^maxP fc {5 ^ k}^j T^^° logc ™2f^ 5^ U ) D (P"l^i) 
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